On a 800x600 pixel canvas display, 5 words of objects. When the user says the name of the object, display that object on the canvas. You might need to find common objects, as it will help with the voice recognition.
Offer the users:
Have a button with the value of "Speak". When the user clicks on that button the value changes to "Stop"
Have your program respond to words of the objects
Have on the screen instructions that list the following, but write so it looks like instructions for a user:
Use Text to Speech to have the program speak back to the user.
When the user presses the Speak button, have the program listen and stop when the user stops speaking (which is automatic) or if they press "Stop".
Display unknown, if the object is not listed (or the speech is misunderstood).
https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/onresult
Set continuous to be false, but start the recognition again, if you want to keep listening for more commands. Otherwise try the continuous feature, but I found it easier to start the recognition after it stopped.
Helpful Links:
https://shapeshed.com/html5-speech-recognition-api/
http://stephenwalther.com/archive/2015/01/05/using-html5-speech-recognition-and-text-to-speech
https://dvcs.w3.org/hg/speech-api/raw-file/9a0075d25326/speechapi.html
https://stiltsoft.com/blog/2013/05/google-chrome-how-to-use-the-web-speech-api/
https://davidwalsh.name/speech-recognition